Search CORE

40 research outputs found

Images in Language Space: Exploring the Suitability of Large Language Models for Vision & Language Tasks

Author: Hakimov Sherzod
Schlangen David
Publication venue
Publication date: 23/05/2023
Field of study

Large language models have demonstrated robust performance on various language tasks using zero-shot or few-shot learning paradigms. While being actively researched, multimodal models that can additionally handle images as input have yet to catch up in size and generality with language-only models. In this work, we ask whether language-only models can be utilised for tasks that require visual input -- but also, as we argue, often require a strong reasoning component. Similar to some recent related work, we make visual information accessible to the language model using separate verbalisation models. Specifically, we investigate the performance of open-source, open-access language models against GPT-3 on five vision-language tasks when given textually-encoded visual information. Our results suggest that language models are effective for solving vision-language tasks even with limited samples. This approach also enhances the interpretability of a model's output by providing a means of tracing the output back through the verbalised image content.Comment: Accepted at ACL 2023 Finding

arXiv.org e-Print Archive

Combining Textual Features for the Detection of Hateful and Offensive Language

Author: Ewerth Ralph
Hakimov Sherzod
Publication venue: Aachen, Germany : RWTH Aachen
Publication date: 01/01/2021
Field of study

The detection of offensive, hateful and profane language has become a critical challenge since many users in social networks are exposed to cyberbullying activities on a daily basis. In this paper, we present an analysis of combining different textual features for the detection of hateful or offensive posts on Twitter. We provide a detailed experimental evaluation to understand the impact of each building block in a neural network architecture. The proposed architecture is evaluated on the English Subtask 1A: Identifying Hate, offensive and profane content from the post datasets of HASOC-2021 dataset under the team name TIB-VA. We compared different variants of the contextual word embeddings combined with the character level embeddings and the encoding of collected hate terms

arXiv.org e-Print Archive

Repositorium für Naturwissenschaften und Technik

Unveiling Global Narratives: A Multilingual Twitter Dataset of News Media on the Russo-Ukrainian Conflict

Author: Cheema Gullal S.
Hakimov Sherzod
Publication venue
Publication date: 22/06/2023
Field of study

The ongoing Russo-Ukrainian conflict has been a subject of intense media coverage worldwide. Understanding the global narrative surrounding this topic is crucial for researchers that aim to gain insights into its multifaceted dimensions. In this paper, we present a novel dataset that focuses on this topic by collecting and processing tweets posted by news or media companies on social media across the globe. We collected tweets from February 2022 to May 2023 to acquire approximately 1.5 million tweets in 60 different languages. Each tweet in the dataset is accompanied by processed tags, allowing for the identification of entities, stances, concepts, and sentiments expressed. The availability of the dataset serves as a valuable resource for researchers aiming to investigate the global narrative surrounding the ongoing conflict from various aspects such as who are the prominent entities involved, what stances are taken, where do these stances originate, and how are the different concepts related to the event portrayed.Comment: Dataset can be found at https://zenodo.org/record/804345

arXiv.org e-Print Archive

Learning Multilingual Semantic Parsers for Question Answering over Linked Data. A comparison of neural and probabilistic graphical model architectures

Author: Hakimov Sherzod
Publication venue: Universität Bielefeld
Publication date: 01/01/2019
Field of study

Hakimov S. Learning Multilingual Semantic Parsers for Question Answering over Linked Data. A comparison of neural and probabilistic graphical model architectures. Bielefeld: Universität Bielefeld; 2019.The task of answering natural language questions over structured data has received wide interest in recent years. Structured data in the form of knowledge bases has been available for public usage with coverage on multiple domains. DBpedia and Freebase are such knowledge bases that include encyclopedic data about multiple domains. However, querying such knowledge bases requires an understanding of a query language and the underlying ontology, which requires domain expertise. Querying structured data via question answering systems that understand natural language has gained popularity to bridge the gap between the data and the end user. In order to understand a natural language question, a question answering system needs to map the question into query representation that can be evaluated given a knowledge base. An important aspect that we focus in this thesis is the multilinguality. While most research focused on building monolingual solutions, mainly English, this thesis focuses on building multilingual question answering systems. The main challenge for processing language input is interpreting the meaning of questions in multiple languages. In this thesis, we present three different semantic parsing approaches that learn models to map questions into meaning representations, into a query in particular, in a supervised fashion. Each approach differs in the way the model is learned, the features of the model, the way of representing the meaning and how the meaning of questions is composed. The first approach learns a joint probabilistic model for syntax and semantics simultaneously from the labeled data. The second method learns a factorized probabilistic graphical model that builds on a dependency parse of the input question and predicts the meaning representation that is converted into a query. The last approach presents a number of different neural architectures that tackle the task of question answering in end-to-end fashion. We evaluate each approach using publicly available datasets and compare them with state-of-the-art QA systems

Publications at Bielefeld University

Named Entity Recognition and Disambiguation using Linked Data and Graph-based Centrality Scoring

Author: Dogdu Erdogan
Hakimov Sherzod
Oto Salih Atilay
Publication venue
Publication date: 01/01/2012
Field of study

Hakimov S, Oto SA, Dogdu E. Named Entity Recognition and Disambiguation using Linked Data and Graph-based Centrality Scoring. In: SIGMOD, SWIM 2012. 2012: 4.Named Entity Recognition (NER) is a subtask of informationextraction and aims to identify atomic entities in text that fall intopredefined categories such as person, location, organization, etc.Recent efforts in NER try to extract entities and link them tolinked data entities. Linked data is a term used for data resourcesthat are created using semantic web standards such as DBpedia.There are a number of online tools that try to identify namedentities in text and link them to linked data resources. Althoughone can use these tools via their APIs and web interfaces, they usedifferent data resources and different techniques to identify namedentities and not all of them reveal this information. One of themajor tasks in NER is disambiguation that is identifying the rightentity among a number of entities with the same names; forexample “apple” standing for both “Apple, Inc.” the company andthe fruit. We developed a similar tool called NERSO, short forNamed Entity Recognition Using Semantic Open Data, toautomatically extract named entities, disambiguating and linkingthem to DBpedia entities. Our disambiguation method is based onconstructing a graph of linked data entities and scoring them usinga graph-based centrality algorithm. We evaluate our system bycomparing its performance with two publicly available NER tools.The results show that NERSO performs better

Publications at Bielefeld University

Recommended from our members

Classification of important segments in educational videos using multimodal features

Author: Ewerth Ralph
Ghauri Junaid Ahmed
Hakimov Sherzod
Publication venue: Aachen, Germany : RWTH Aachen
Publication date: 01/01/2020
Field of study

Videos are a commonly-used type of content in learning during Web search. Many e-learning platforms provide quality content, but sometimes educational videos are long and cover many topics. Humans are good in extracting important sec-tions from videos, but it remains a significant challenge for computers. In this paper, we address the problem of assigning importance scores to video segments, that is how much information they contain with respect to the overall topic of an educational video. We present an annotation tool and a new dataset of annotated educational videos collected from popular online learning platforms. Moreover, we propose a multimodal neural architecture that utilizes state-of-the-art audio, visual and textual features. Our experiments investigate the impact of visual and temporal information, as well as the combination of multimodal features on importance prediction

Repositorium für Naturwissenschaften und Technik

Recommended from our members

TIB's visual analytics group at MediaEval '20: Detecting fake news on corona virus and 5G conspiracy

Author: Cheema Gullal S.
Ewerth Ralph
Hakimov Sherzod
Publication venue: Aachen, Germany : RWTH Aachen
Publication date: 01/01/2020
Field of study

Fake news on social media has become a hot topic of research as it negatively impacts the discourse of real news in the public. Specifi-cally, the ongoing COVID-19 pandemic has seen a rise of inaccurate and misleading information due to the surrounding controversies and unknown details at the beginning of the pandemic. The Fak-eNews task at MediaEval 2020 tackles this problem by creating a challenge to automatically detect tweets containing misinformation based on text and structure from Twitter follower network. In this paper, we present a simple approach that uses BERT embeddings and a shallow neural network for classifying tweets using only text, and discuss our findings and limitations of the approach in text-based misinformation detection

Repositorium für Naturwissenschaften und Technik

Recommended from our members

Check square at CheckThat! 2020: Claim Detection in Social Media via Fusion of Transformer and Syntactic Features

Author: Cheema Gullasl S.
Ewerth Ralph
Hakimov Sherzod
Publication venue: Aachen, Germany : RWTH Aachen
Publication date: 01/01/2020
Field of study

In this digital age of news consumption, a news reader has the ability to react, express and share opinions with others in a highly interactive and fast manner. As a consequence, fake news has made its way into our daily life because of very limited capacity to verify news on the Internet by large companies as well as individuals. In this paper, we focus on solving two problems which are part of the fact-checking ecosystem that can help to automate fact-checking of claims in an ever increasing stream of content on social media. For the ﬁrst prob-lem, claim check-worthiness prediction, we explore the fusion of syntac-tic features and deep transformer Bidirectional Encoder Representations from Transformers (BERT) embeddings, to classify check-worthiness of a tweet, i.e. whether it includes a claim or not. We conduct a detailed feature analysis and present our best performing models for English and Arabic tweets. For the second problem, claim retrieval, we explore the pre-trained embeddings from a Siamese network transformer model (sentence-transformers) speciﬁcally trained for semantic textual similar-ity, and perform KD-search to retrieve veriﬁed claims with respect to a query tweet

Repositorium für Naturwissenschaften und Technik